34 research outputs found

    Annotations sémantiques pour le domaine Biopuces

    Get PDF
    Après avoir souligné l'intérêt du Web sémantique dans le domaine biomédical et de l'apport des annotations sémantiques dans la recherche d'information nous présentons une méthode pour la génération semi-automatique des annotations sémantiques décrivant des articles dans le domaine des biopuces et ce en se basant sur les techniques d'extraction d'information

    PatClust: une plateforme pour la classification sémantique des brevets

    Get PDF
    International audienceNous présentons ici une approche générique pour la classification des brevets fondée sur la sémantique contenue dans ces documents

    Querying the Semantic Web of Data using SPARQL, RDF and XML

    Get PDF
    The Semantic Web relies on two layers: XML and RDF. XML documents are merely trees representing structured data or documents and accessed using XPath. RDF is used for a Web of data and to provide metadata about data or documents; it is organized as graphs made of collections of elementary triples and can be queried using SPARQL. Based on these two paradigms, there exist tools and platforms that produce and process both XML and RDF. When doing information integration and mash-up applications, there are scenarios where we need to query, compare and integrate data coming from both worlds. In this report we present a seamless way of mixing both paradigms in SPARQL. Generic extensions to SPARQL are explained, and then we provide use cases and an application in semantic annotation of textual documents using NLP techniques

    Mining Biomedical Texts to Generate Semantic Annotations

    Get PDF
    This report focuses on text mining in the biomedical domain for the generation of semantic annotations based on a formal model which is ontology. We start by exposing the generic methodology for the generation of annotations from texts. Then, we present a state of the art on different knowledge extraction techniques used on biomedical texts. We propose our approach based on Semantic Web Technologies and Natural Language Processing (NLP): it relies on formal ontologies to generate semantic annotations on scientific articles and on other knowledge sources (databases, experiment sheets). This approach can be extended to other do-mains requiring experiments and massive data analyses. Finally, we conclude with a discussion about our work and we present some learnt lessons

    Mining Biomedical Texts to Generate Semantic Annotations

    Get PDF
    This report focuses on text mining in the biomedical domain for the generation of semantic annotations based on a formal model which is ontology. We start by exposing the generic methodology for the generation of annotations from texts. Then, we present a state of the art on different knowledge extraction techniques used on biomedical texts. We propose our approach based on Semantic Web Technologies and Natural Language Processing (NLP): it relies on formal ontologies to generate semantic annotations on scientific articles and on other knowledge sources (databases, experiment sheets). This approach can be extended to other do-mains requiring experiments and massive data analyses. Finally, we conclude with a discussion about our work and we present some learnt lessons

    Vers une personnalisation de la navigation par l'apprentissage de profils utilisateurs.

    Get PDF
    International audienceL'exploitation des interaction utilisateurs-sites Web peut jouer un rôle important pour l'amélioration de la navigation dans le futur Web. Dans une mesure plus particulière, dégager et reconnaître les profils des internautes à partir de ces données peut aider les navigateurs et les sites Web à personnaliser les sessions utilisateurs tout en recommandant des ressources spécifiques. Nous présentons à travers ce papier une solution de reconnaissance de profils basée sur les technologies du Web sémantique. Cette approche tire ses avantages de l'utilisation des ontologies, des annotations sémantiques sur les ressources Web et d'un moteur d'inférence et d'un moteur de recherche sémantique

    Biomedical word sense disambiguation with ontologies and metadata: automation meets accuracy

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>Ontology term labels can be ambiguous and have multiple senses. While this is no problem for human annotators, it is a challenge to automated methods, which identify ontology terms in text. Classical approaches to word sense disambiguation use co-occurring words or terms. However, most treat ontologies as simple terminologies, without making use of the ontology structure or the semantic similarity between terms. Another useful source of information for disambiguation are metadata. Here, we systematically compare three approaches to word sense disambiguation, which use ontologies and metadata, respectively.</p> <p>Results</p> <p>The 'Closest Sense' method assumes that the ontology defines multiple senses of the term. It computes the shortest path of co-occurring terms in the document to one of these senses. The 'Term Cooc' method defines a log-odds ratio for co-occurring terms including co-occurrences inferred from the ontology structure. The 'MetaData' approach trains a classifier on metadata. It does not require any ontology, but requires training data, which the other methods do not. To evaluate these approaches we defined a manually curated training corpus of 2600 documents for seven ambiguous terms from the Gene Ontology and MeSH. All approaches over all conditions achieve 80% success rate on average. The 'MetaData' approach performed best with 96%, when trained on high-quality data. Its performance deteriorates as quality of the training data decreases. The 'Term Cooc' approach performs better on Gene Ontology (92% success) than on MeSH (73% success) as MeSH is not a strict is-a/part-of, but rather a loose is-related-to hierarchy. The 'Closest Sense' approach achieves on average 80% success rate.</p> <p>Conclusion</p> <p>Metadata is valuable for disambiguation, but requires high quality training data. Closest Sense requires no training, but a large, consistently modelled ontology, which are two opposing conditions. Term Cooc achieves greater 90% success given a consistently modelled ontology. Overall, the results show that well structured ontologies can play a very important role to improve disambiguation.</p> <p>Availability</p> <p>The three benchmark datasets created for the purpose of disambiguation are available in Additional file <supplr sid="S1">1</supplr>.</p> <suppl id="S1"> <title> <p>Additional file 1</p> </title> <text> <p><b>Benchmark datasets used in the experiments.</b> The three corpora (High quality/Low quantity corpus; Medium quality/Medium quantity corpus; Low quality/High quantity corpus) are given in the form of PubMed identifiers (PMID) for True/False cases for the 7 ambiguous terms examined (GO/MeSH/UMLS identifiers are also given).</p> </text> <file name="1471-2105-10-28-S1.txt"> <p>Click here for file</p> </file> </suppl

    Semantic Web Applications and Tools for Life Sciences, 2008 – Introduction

    Get PDF
    BACKGROUND: Semantically-enriched browsing has enhanced the browsing experience by providing contextualized dynamically generated Web content, and quicker access to searched-for information. However, adoption of Semantic Web technologies is limited and user perception from the non-IT domain sceptical. Furthermore, little attention has been given to evaluating semantic browsers with real users to demonstrate the enhancements and obtain valuable feedback. The Sealife project investigates semantic browsing and its application to the life science domain. Sealife's main objective is to develop the notion of context-based information integration by extending three existing Semantic Web browsers (SWBs) to link the existing Web to the eScience infrastructure. METHODS: This paper describes a user-centred evaluation framework that was developed to evaluate the Sealife SWBs that elicited feedback on users' perceptions on ease of use and information findability. Three sources of data: i) web server logs; ii) user questionnaires; and iii) semi-structured interviews were analysed and comparisons made between each browser and a control system. RESULTS: It was found that the evaluation framework used successfully elicited users' perceptions of the three distinct SWBs. The results indicate that the browser with the most mature and polished interface was rated higher for usability, and semantic links were used by the users of all three browsers. CONCLUSION: Confirmation or contradiction of our original hypotheses with relation to SWBs is detailed along with observations of implementation issues

    Towards a breakthrough speaker identification approach for law enforcement agencies

    Get PDF
    This paper describes a high performance innovative and sustainable Speaker Identification (SID) solution, running over large voice samples database. The solution is based on development, integration and fusion of a series of speech analytic algorithms which includes speaker model recognition, gender identification, age identification, language and accent identification, keyword and taxonomy spotting. A full integrated system is proposed ensuring multisource data management, advanced voice analysis, information sharing and efficient and consistent man-machine interactions
    corecore